Plagiarism Detection Considering Frequent Senses Using Graph Based Research Document Clustering
نویسنده
چکیده
A new, graph based research document clustering technique (GRD-Clust) is introduced based on frequent senses rather than frequent keywords as per the traditional document clustering techniques.GRDClust presents text documents as hierarchal document-graphs and utilizes an Apriori paradigm to find the frequent sub graphs, which reflect frequent senses based on support and confidence. We highlight the different types of plagiarism and address the issues of plagiarism of text, plagiarism of ideas, mosaic plagiarism, self-plagiarism, and duplicate publication. Different documents eschewed of plagiarism by identifying the alleged terms are considered. An act of plagiarism can have several repercussions when an article does not score high on clarity or lacks conciseness, the deficiency is typically unintentional.
منابع مشابه
Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملAPRIORI APPROACH TO GRAPH-BASED CLUSTERING OF TEXT DOCUMENTS by Mahmud
This thesis report introduces a new technique of document clustering based on frequent senses. The developed system, named GDClust (Graph-Based Document Clustering) [1], works with frequent senses rather than dealing with frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and uses an Apriori paradigm to find the frequent...
متن کاملApproaches for Intrinsic and External Plagiarism Detection - Notebook for PAN at CLEF 2011
Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism detection are presented. First, the m...
متن کاملSemantically-Guided Clustering of Text Documents via Frequent Subgraphs Discovery
In this paper we introduce and analyze two improvements to GDClust [1], a system for document clustering based on the co-occurrence of frequent subgraphs. GDClust (Graph-Based Document Clustering) works with frequent senses derived from the constraints provided by the natural language rather than working with the co-occurrences of frequent keywords commonly used in the vector space model of doc...
متن کاملPlagiarism Alignment Detection by Merging Context Seeds
We describe our submitted algorithm to the text alignment sub-task of the plagiarism detection task in the PAN2014 challenge that achieved a plagdet score 0.855. By extracting contextual features for each document character and grouping those that are relevant for a given pair of documents, we generate seeds of atomic plagiarism cases. These are then merged by an agglomerative singlelinkage str...
متن کامل